Skip to main content

Reply to this post | See parent post | Go Back
View Post [edit]

Poster: Alex.brollo Date: Feb 25, 2014 3:30pm
Forum: forums Subject: Re: Pypi internetarchive module

Now I'm working about a tricky step: self-building of a decent set of metadata, grabbed from source website. So there's some metadata mismatch in my recent uploads with internetarchive module, I'll review the uploads to fix metadata; some sets of images need a manual cropping too.

Please be patient for any mistake by me while I'm testing scripts. I'm near the limits of my skill.

Reply to this post
Reply [edit]

Poster: Jeff Kaplan Date: Feb 25, 2014 5:38pm
Forum: forums Subject: Re: Pypi internetarchive module

if you hit a wall send eamil to info@archive.org and we might be able to assist.

Reply to this post
Reply [edit]

Poster: Alex.brollo Date: Feb 25, 2014 10:46pm
Forum: forums Subject: Re: Pypi internetarchive module

Thanks! By now, I need to know two details - the simpler one:

1. What's the right encoding of metadata text; utf-8 or unicode?

The harder one:
2. Ancient Italian text OCR is faulty, both for different characters and different spelling of many words. Human editing is the key step to gain a good result (this is the scope of wikisource, of Distributed proofreaders and I guess of others that I don't know. An high djvu quality is very useful when proofreading; I see that aggressive djvu compression for color images lowers a lot quality of derived djvu. Can djvu compression be tuned somehow?

Reply to this post
Reply [edit]

Poster: Jeff Kaplan Date: Feb 26, 2014 9:11am
Forum: forums Subject: Re: Pypi internetarchive module

1. utf-8 is fine.

2. if you upload a djvu that's properly named {identifier}.djvu the item will not derive one to replace it. as for our compression it is set by the engineer most familiar with the vagaries of uploaded texts to find a happy medium.

Reply to this post
Reply [edit]

Poster: Alex.brollo Date: Feb 26, 2014 2:15pm
Forum: forums Subject: Re: Pypi internetarchive module

1. ok, I'll test utf-8.

2. Wow! This means that .... ok, let's refining present project before starting a new one.